Method and processing unit for selective value prediction using data cache hit/miss information and/or dependency depth information

ABSTRACT

The present invention relates to a processing unit for executing instructions in a computer system and to a method in such a processing unit. According to the present invention a decision is made whether or not to base execution on a value prediction (P), wherein the decision is based on information associated with the estimated time gain of execution based on a correct prediction. According to an embodiment of the present invention the decision regarding whether or not to execute speculatively is based on information regarding whether a cache hit or a cache miss is detected in connection with a load instruction. In an alternative embodiment of the present invention the decision is based on information regarding the dependency depth of the load instruction, i.e. the number of instructions that are dependent on the load.

This application is the U.S. national phase of international applicationPCT/SE02/00298 filed Feb. 21, 2002 which designated the U.S., and claimsthe priority of Swedish patent application No. 0102564-2 filed Jul. 19,2001, the entire contents of both of which are hereby incorporated byreference.

FIELD OF THE INVENTION

The present invention relates in general to computer systems, and morespecifically to the design and functioning of a processor.

RELATED ART AND OTHER CONSIDERATIONS

The performance of computer processors has been tremendously enhancedover the years. This has been achieved both by means of makingoperations faster and by means of increasing the parallelism of theprocessors, i.e. the ability to execute several operations in parallel.Operations can for instance be made faster by means of improvingtransistors to make them switch faster or optimizing the design tominimize the level of logic needed to implement a given function.Techniques for parallelism include pipelining and superscalartechniques. Pipelined processors overlap instructions in time on commonexecution resources. Superscalar processors overlap instructions inspace on separate execution resources. Many processor architectures arecombinations of a pipelined and a superscalar processor.

Today, one of the major limiting factors in processor performance ismemory speed. In order to fully benefit from the processor's speed, theprocessor must receive data to operate on from data memory at a speedthat keeps the processor busy continuously. This problem can be attackedby supplying a smaller and faster memory, called a cache, close to theprocessor. The cache reduces the delay associated with memory access bystoring subsets of the memory data that can be quickly read and modifiedby the processor. There are many different methods that can be used tomap data from the memory to the cache. If the temporal and spatiallocality is beneficial the cache may help to circumvent the underlyingproblem of divergence in processor speed and memory speed.

Lately, it has been suggested to not just store data closer to theprocessor in a cache but also attempt to predict the value of requesteddata. This enables the processor to continue executing speculativelyusing the predicted value while waiting for the true value to bedelivered from the memory system. When the true value arrives thevalidity of the prediction is checked. If the prediction was correct thespeculative execution is correct and a performance gain is realized. Ifthe prediction was incorrect the speculative execution is incorrect andmust be re-executed with the correct value. A flush of mis-speculatedexecution and a restart of the executed instructions that depend on thepredicted value imply a time loss. Since incorrect prediction is costly,prediction confidence threshold values based on the history of earlierpredictions are often used to inhibit value prediction if the likelihoodof correct prediction is too small.

U.S. Pat. No. 5,781,752 discloses a processor provided with means forspeculative execution. The processor has a data speculation circuitcomprising a prediction threshold detector and a prediction table. Theprediction table stores prediction counters that reflect the historicalrate of mis-speculation for an instruction. The prediction thresholddetector prevents data speculation for instructions having a predictioncounter within a predetermined range.

“Predictive Techniques for Aggressive Load Speculation”, Reinman Glennet al., published in the Proceedings of the Annual 31^(st) InternationalSymposium on Microarchitecture, December 1998 describes a number ofmethods for load speculation, one of them being value prediction. It isfurther described that the method of value prediction uses a confidencecounter to decide when to speculate a load. The counter is increased ifa correct prediction occurs and decreased if the prediction isincorrect. Speculative execution will only take place if the counter isabove a predict threshold.

Lipasti M. H. et al., “Value Locality and Load Value Prediction”,Proceedings of the ACM Conference on Architectural Support forProgramming Languages and Operating Systems, 1996 describes the conceptof value locality which is defined as the likelihood of apreviously-seen value recurring repeatedly within a storage location. Itis discussed that it might be beneficial to introduce a load valueprediction unit if the value locality is significant. The load valueprediction unit comprises a load classification table for deciding whichpredictions are likely to be correct. The load classification tableincludes counters for load instructions. The counters indicate thesuccess rate of previous predictions and are incremented for correctpredictions and decremented otherwise. Based on the value of the countera load instruction is classified as unpredictable, predictable orconstant. Speculative execution is prevented for load instructions thatare classified as unpredictable.

Calder B. et al., “Selective Value Prediction”, Proceedings of the26^(th) International Symposium on Computer Architecture, May 1999describes techniques for selectively performing value prediction. Onesuch technique is instruction filtering, which filters whichinstructions put values into the value prediction table. Filteringtechniques that discussed include filtering based on instruction typeand giving priority to instructions belonging to the data dependencepath in the processor's active instruction window.

Tune E. et al., “Dynamic Prediction of Critical Path Instructions”,Proceedings of the 7^(th) International Symposium on Hogh PerformanceComputer Architecture, January 2001, and Fields B. et al., “FocusingProcessor Policies via Critical-Path Prediction”, InternationalSymposium on Computer Architecture, June 2001 describe value predictionwherein a critical path prediction is used to choose the instructions topredict.

BRIEF SUMMARY

The methods for deciding whether or not to execute speculatively basedon a predicted value, which were described above, have disadvantages.Some of the methods described above are based merely on an estimation ofthe likelihood of correct prediction based on historical statistics.This may result in taking unnecessarily high risks. Executingspeculatively is a gamble, if the prediction is correct, you win time,but if the prediction is incorrect you loose time. Thus, execution basedon a predicted value implies exposure to the risk of delaying executionconsiderably due to a flush and restart. Since the cost of an incorrectprediction is high it is desirable to only expose the processor to therisk involved in speculative execution when it can be motivated.

Some of the methods for selective value speculation described above basethe decision of whether or not to use a predicted value on instructiondependency predictions, which means that a prediction of what thepossible gain may be is weighed into the decision. These methods make itpossible to avoid speculative execution in some cases where it is unwisedue to the possible gain being very low. However, all of the describedmethods of this type are rather complex and they all use predictionsrelating to dependency instead of true dependency information, whichmeans that there is a degree of uncertainty involved.

The processing unit and method include relatively simple means fordeciding when to execute speculatively and wherein the decision toexecute speculatively is based on criteria that allows for improvedmanagement of the risks involved, as compared with the prior art whereinthe decision is based merely on an estimation of the likelihood ofcorrect prediction.

Improved risk management is afforded by means of basing the decisionwhether or not to execute speculatively on information associated withthe estimated time gain of execution based on a correct prediction. Itis with the prior art methods possible that the processing unit isexposed to the negative impact of mis-prediction also when the gain fora correct prediction is very small. It is possible to make wiserdecisions by means of taking the gain for a correct prediction intoaccount. If the cost for mis-prediction is high and the gain for correctprediction is low it is probably wise not to speculate even if thelikelihood of a correct prediction is high. Since the estimated gain ofcorrect prediction is a decision criterion it is possible to avoidspeculative execution in situations where it might seem unwise tospeculate but where speculative execution undoubtedly would take placeif the decision was based merely on the likelihood of a correctprediction, as in the prior art methods described above.

According to an example embodiment, the decision regarding whether ornot to execute speculatively is based on information regarding whether acache hit or a cache miss is detected in connection with a loadinstruction. A cache hit implies that the true value corresponding tothe instruction will be available shortly since the value was found in avery fast memory. If, on the other hand, a cache miss is detected it isa sign that the value must be loaded from a slower memory and that itmight take a while until the true value is available. Therefore, a cachehit is a factor that weighs against speculative execution, since itimplies a small performance gain for a correct prediction.

In an alternative embodiment a cache hit prediction, based on thehistoric likelihood of detecting a cache hit or miss for a value of acertain load instruction, is used as a factor in the speculationdecision, instead of the actual detected cache hit or miss.

According to yet another alternative embodiment the decision regardingwhether or not to execute speculatively is based on informationregarding the true dependency depth of the load instruction, i.e. thenumber of instructions that are dependent on the load. If the number ofdependent instructions is low it might, depending on the processorarchitecture, be possible to hide the latency of the load with otherinstructions that are independent of the load. If this is possible thegain of a correct prediction for the load will be small or none at all.The dependency depth of a certain load instruction is therefore,according to an embodiment, used as a factor in the decision regardingwhether to execute the load instruction speculatively or not. Accordingto other embodiments a predicted dependency depth is used as a factor inthe decision instead of the true dependency depth.

An advantage of the present technology is that it makes it possible toimprove the performance of processing units since the technology makesit possible to avoid speculative execution when the gain for a correctvalue prediction is too small to motivate taking the risk ofmis-prediction. The cost of recovery due to mis-prediction is fairlylarge and it would therefore be unwise to expose the processor to therisk of a recovery when the performance gain involved in the valuespeculation is small. The present technology makes it possible torestrict value speculation to when the potential gain is significant.

A further advantage is that since it makes it possible to avoidspeculative execution when the performance gain is small, the cost ofrecovery becomes less critical. It is possible to allow more costlyrecovery since restricting speculative execution is limited to caseswhere the estimated performance gain of correct prediction isconsiderably larger than the recovery cost.

Another advantage of an embodiment is that it reduces the need forstorage of value prediction history statistics as will be explained ingreater detail below.

Yet another advantage of the embodiment is that it the informationrelating to the possible time gain of a correct decision which is usedin the decision of whether or not to execute speculatively, isinformation that relates to the next execution of the instruction forwhich speculative execution is an option and not to historic informationrelating to earlier executions of the same instruction. Thus theconfidence in making a correct decision regarding speculative executionis improved according to this embodiment.

The invention will now be described with the aid of preferredembodiments and with reference to a accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of selected parts of a processorthat is adapted according to an example embodiment.

FIGS. 2 a–d are time diagrams that illustrate the possible gain and lossinvolved for speculative execution in cases where a cache hit occurs andin cases where a cache miss occurs.

FIG. 3 is a block diagram of a value prediction unit according to anexample embodiment, where arrows indicate the steps involved in anembodiment of an example method.

FIG. 4 is a block diagram tat shows a more detailed illustration of abuffer unit of a value prediction unit according to an exampleembodiment.

FIG. 5 is a schematic block diagram of a SID structure according to anembodiment and a reorder buffer (ROB).

FIGS. 6 a–6 e are schematic block diagrams that illustrate an example ofhow dependency depth is registered and utilized according to anembodiment of the present invention.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

FIG. 1 shows a schematic block diagram of selected parts of a processor1. A dispatch unit 2 is shown, which determines the instructions thatare to be executed next and distributes the instructions between anumber of reservation stations 3 for forwarding to execution units 4. Inorder to execute the instructions, data may have to be fetched frommemory and supplied to the execution units 4. Fetching of data frommemory is handled by address calculation unit (ACU) 5 and data fetchunit 6, which output the fetched data to the reservation stations 3. Thedata fetch unit is able to load data from memory units such as a cacheor other types of slower memory units (not shown). The processor 1 is anout-of-order processor, i.e. a processor that allows instructions to beexecuted out of program order. Out-of-order execution is supported by aReorder buffer (ROB) 7, which buffers results until they can be writtento a register file in program order. The reservation stations 3, theexecution units 4 and the ROB 7 constitute the execution engine 8 of theprocessor 1.

The processor 1 further comprises a value prediction unit (VPU) 9, whichenables predictions of values to be fetched from memory. The predictionscan be used for speculative execution. If speculative execution is to beperformed the value prediction unit 9 produces a prediction P that ispresented to the appropriate reservation station 3 before the true valueV is received from the data fetch unit 6. The execution is carried outbased on the prediction P. Flags are set to indicate that results basedon the prediction P are speculative. When the true value V is receivedfrom memory the data fetch unit 6 sends this value to the VPU 9, whichuses this value to check if the prediction P was correct. If theprediction P was correct (i.e. P=V) the VPU 9 sends a signal s2 to theexecution engine 8 that the prediction P was correct and thatspeculative flags can be cleared. If the prediction P was incorrect(i.e. P≠V), the VPU 9 sends a flush order s3 to the execution engine 8,which causes all results based on the incorrect prediction to be flushedout from the ROB 7 and restarts execution based on the true value V.

When a load instruction (i.e. an instruction to fetch data from memory)is to be executed the VPU 9 receives an index signal s1 from thedispatch unit 2, which is used to index (via some hash function) theprediction P. As described above a decision of whether or not to use theprediction P for speculative execution is often made based on thehistorical success rate of earlier predictions for the same value. TheVPU 9 may for this purpose store a counter, or some other type ofhistory data, which is associated with a particular value and whichreflects the success rate of earlier predictions for the particularvalue. The index signal s1 from the dispatch unit 2 is used to retrievethe appropriate counter or history data. As mentioned above there areseveral known methods for keeping track of past success rate ofpredictions and basing the decision whether to speculate or not on thissuccess rate.

As mentioned above, it is possible that speculation takes place insituations where it might be preferable to wait for the true value V, ifthe past success rate is used as the only speculation decision criteria.If the estimated gain of speculative execution based on a prediction,that later turn out to be correct, is small it might be better not tospeculate and instead wait for the true value V. The estimated gain ofexecution based on a correct prediction for a load instruction is usedas a speculation decision criterium. Thereby, the risk of unwiselyexposing the processor to the negative impact of mis-prediction can bereduced.

A cache hit or a cache miss gives an indication of what the gain fromspeculative execution might be for a particular load instruction. In anembodiment a cache hit or miss is therefore taken into considerationwhen the decision whether to speculate or not is made. It is possible todecide to only speculate when the load latency is large and theperformance gain for correct prediction is high. A cache miss is anindication of large load latency since it indicates that the value wasnot found in the cache but has to be loaded from a slower memory. Acache hit on the other hand indicates that the true value will beavailable in this or the next few cycles since loads from the cache canbe performed quickly. It is thus more advantageous to speculate when acache miss is detected than when a cache hit is detected. FIGS. 2 a–dgive an illustration of the possible gain and loss involved forspeculative execution in cases where a cache hit occurs and in caseswhere a cache miss occurs.

FIG. 2 a illustrates a time line t for a case where a cache miss occursand the prediction was incorrect. The speculative execution starts attime a0′. At time a1 the true value is received which shows that theprediction was incorrect and causes a restart. The restart is finishedat time a0 where non-speculative execution begins based on the truevalue.

FIG. 2 b illustrates a time line t for a case where a cache miss occursand the prediction was correct The speculative execution starts at timeb0′. At time b1 the true value is received which shows that theprediction was correct.

Execution can continue from time b1 without having to restart andre-execute what was executed between time b0′ and time b1.

FIG. 2 c illustrates a time line t for a case where a cache hit occursand the prediction was incorrect. The speculative execution starts attime c0′. At time c1 the true value is received which shows that theprediction was incorrect and causes a restart. The restart is finishedat time c0 where non-speculative execution begins based on the truevalue.

FIG. 2 d illustrates a time line t for a case where a cache hit occursand the prediction was correct. The speculative execution starts at timed0′. At time d1 the true value is received which shows that theprediction was correct. Execution can continue from time d1 withouthaving to restart and re-execute what was executed between time d0′ andtime d1.

The dotted areas in FIGS. 2 a and 2 c indicate the execution time lossdue to restart. The dashed areas indicate the execution time gain due tocorrect value speculation. Thus a cache miss indicates a possible gainthat is large compared to the possible loss, while a cache hit indicatesa possible gain that is small compared to the possible loss.

The processor is exposed to the danger of imposing the restart penalty(a1-a0 or c1-c0) only when the potential performance gain is large as itis at a cache miss.

Hence according to an embodiment a cache hit signal s4 is input to theVPU 9 as shown in FIG. 1. The cache hit signal s4 includes cachehit/miss information 14 tat indicate whether a cache hit or a cache misswas detected. According to an embodiment a detected cache hit is used asa speculation inhibit signal, such that the VPU 9 is prevented frompresenting speculative data to the reservation stations 3 when the cachehit signal s4 indicates a cache hit. In another embodiment the cachehit/miss information 14 and history data related to the success rate ofprevious predictions are weighted and combined to form a decision valuetat indicates whether or not speculative execution should take place.The cache hit/miss information 14 is an added criterion, which is usedas a factor in the speculation decision scheme. Many alternativedecision schemes tat take cache hit/miss information into considerationare possible as will be explained in greater detail below.

FIG. 3 shows an implementation of the VPU 9. The VPU 9 comprises a valuecache 10 and a buffer unit 11. In the value cache 10, values andinformation related to the values are stored. The values that are storedin the value cache 10 are not the true values V but predictions P of thetrue values V. Each prediction P corresponds to a particular loadinstruction. The predictions are associated with an identity code, whichis used to identify the prediction that corresponds to a particular loadinstruction. History data, such as counter data, related to earlierpredictions is also stored in the value cache 10.

When a load instruction is to be executed the dispatch unit 2 sends theindex signal s1 to the value cache 10. The index signal s1 contains anindex, which for instance is a hashing of the load instruction's addressor a hashing of the data address, and which helps the value cache 10 toidentify the prediction P associated with the load instruction to beexecuted. The value cache 10 delivers the prediction P and the historydata associated with the prediction to the buffer unit 11 (signals s51and s52). The buffer unit 11 has decision logic 12 that based onpredetermined rules for value prediction decides whether or not tospeculate. According to an embodiment the decision logic 12 receivesboth history data and the cache hit signal s4 as input. If the decisionlogic 12 decides to speculate the prediction P is delivered (signal s61)to the execution engine 8 of the processor together with a flag (signals62) indicating the speculative nature of the value.

After some period of time the true value V is delivered from the memorysystem and received in the buffer unit 11. The true value is comparedwith the prediction P kept in the buffer unit If the prediction P wascorrect the clear speculative flag order s2 is sent to the executionengine 8. If the prediction P was incorrect the flush order s3 is sentinstead. The buffer unit 11 also updates the history data and predictedvalue if needed and sends these (signals s71 and s72) to the value cache10 for storage so that the updated data can be used in subsequentpredictions.

In this particular implementation of the VPU 9, the decision logic 12 isincorporated in the buffer unit 11. FIG. 4 shows a more detailedillustration of the buffer unit 11. The buffer unit has a table 13 inwhich identity codes ID, predictions P and history data H are stored forongoing predictions. The identity code ID enables the processor to keeptrack of the different parts of an instruction, which during executionis scattered all over the processor 1. In the buffer unit 11 theidentity code enables that the right set of true value V and predictionP is compared.

There are two decisions to be made in the decision logic 12, indicatedas parts f and g of the decision logic 12 in FIG. 4. In the part f ofthe logic a yes/no decision is made whether to predict or not based onthe history data H and the cache hit signal s4. The part f logic maybase its decision on an endless variation of rules. The rule may forinstance be to predict if the cache hit signal s4 indicates a cache missand a counter C of the history data H is above a predeterminedthreshold. Another example of a rule is to give weight factors to thecache hit signal s4 and the history data H and base the decision on theresult of a combination of the weighted information.

The cache hit signal s4 is received before the true value V is deliveredfrom the memory system, but there is still some waiting time involveduntil the cache hit signal s4 is received. An alternative embodiment isto use a cache hit prediction s4P instead of the actual cache hit signals4 in the decision described above. This will speed up the predictiondecision at the cost of some uncertainty regarding the expected cachehit. If the cache hit prediction s4P is used instead of the actual cachehit signal s4, history data H is not only stored in respect of valuepredictions P but also in respect of cache hit predictions S4P. In FIG.4, the cache hit signal s4 and the cache hit prediction s4P are shown asdashed input signals to the part f of the decision logic 12. This is toindicate the above mentioned alternatives of either using the actualcache hit signal s4 or the cache hit prediction s4P, which may beincluded in the history data H.

In the part g of the decision logic 12 the true value V is compared tothe value prediction P to decide if the prediction P was correct. Theoutcome of this decision will, as explained above, either be the flushorder s3 or the clear speculative flag order s2. The part g of the logicis also responsible for updating the value predictions P and historydata H if necessary. The cache hit signal s4 can, depending on theimplementation, be input to the g part and used as a factor to calculatecounter values or saved separately for later use in the part f of thelogic. How history data H is stored and updated is the subject of manyresearch reports, and since it is not the core of the invention it isnot discussed in any greater detail herein.

The embodiments discussed above were described in the context of anout-of-order processor 1 equipped with a reorder buffer 7. The presentinvention is however not dependent on any particular processorarchitecture but adapts itself to other architectures. The presentinvention is also applicable with different types of memoryconfigurations. The present invention may for instance be used inmulti-level cache systems where many different prediction schemes arepossible depending on in which level a cache hit is indicated. The rulemay for instance be to inhibit prediction if a cache hit occurs in thefirst level of the cache or to inhibit prediction if there is a cachehit in any of the levels of the cache. In some memory systems a cachemiss is directly indicated by the address. This is the case in forinstance static caches, i.e. SRAM address controlled fast memory, andmemory mapped IO configurations. When a cache miss is directly indicatedby the address there is no need to wait for a cache hit signal s4 or topredict a cache hit, but it is still possible to base the predictiondecision on cache hit/miss information 14.

The present invention may further be used in virtual memory machines. Avirtual memory machine is a processor wherein address calculationinvolves translating a virtual address into a physical address. Theaddress calculation is generally supported by a translation look-asidebuffer (TLB) 24. If the processor in FIG. 1 is assumed to be a virtualmemory machine, the TLB 24 would typically reside in the AddressCalculation Unit (ACU) 5 as an address translation cache. A miss in thiscache would mean a load of page data 25 to get the physical address frommemory before the request to fetch data can be sent to memory, i.e. thelatency before the c the fetch to load the value can be sent to memory.This implies that there is a fairly long latency until the true value isreceived from memory and that the speculative execution may give rise toa fairly large time gain. Thus a signal 26 of a cache hit or cache missfrom the TLB may be used as a factor in the decision of whether or notto execute speculatively in the same way as a cache hit or miss signalin a data cache.

The invention can be used together with confidence prediction data basedon earlier predictions, as described above, or without. In a multi-levelcache system the different cache level hits could be combined withdifferent confidence threshold values to produce an optimal decisionbased on current prediction confidence.

In an example embodiment, the candidates for value prediction are chosenonly among the values for which cache misses occur. There is then noneed to store prediction statistics when cache hits are detected,thereby reducing the need for storage for value prediction historystatistics.

The above-described embodiments utilize information from the memorysystem, which carries information about the expected gain from valueprediction. Information from the execution engine could also be used toestimate the potential gain of value prediction. Embodiments that dothis by means of taking the dependency depth into consideration aredescribed hereinafter.

There are different numbers of instructions depending on each loadinstruction. If the dependency chain is short for a load instruction inan out-of-order processor it is likely that the load latency can behidden by other instructions that do not depend on the value to beloaded. If on the other hand the dependency chain is long, it is anindication that too many instructions are awaiting the result of theload instruction for the load latency to be hidden. The estimated gainof value prediction is thus larger when the dependency chain is longthan when it is short.

Thus an embodiment uses information regarding the dependency depth (i.e.information regarding the length of the dependency chain) to decidewhether to predict a value or not. The decision is to only performspeculative execution based on the prediction P when the number ofdependent instructions on the speculated load value is sufficientlylarge to motivate the risk of speculation. The length of the dependencychain that is suitable to qualify for value prediction depends on thesize of the instruction window subjected to out-of-order execution.

The dependency depth information may either be a prediction of thedependency depth based on dependency chain history or the “current”dependency depth that relate to the load instruction to be executed. Theadvantage of using a prediction of the dependency depth is that it isfairly simple since the current depth may be difficult, or in manyprocessor architectures impossible, to derive. On the other hand, adisadvantage of using a prediction is that the dependency depth may bemis-predicted which means that a certain degree of uncertainty isinvolved.

We will describe both an embodiment based on a prediction of thedependency depth and an embodiment based on the true dependency depth.

According to an example embodiment based on dependency depth prediction,the structure used to retain out-of-order execution data, such as thereorder buffer 7, stores a speculative identity field (SID) instead ofsimple speculative flag. The size of the SID must accommodate the numberof simultaneously active speculations. With the help of the SID, astorage structure is indexed, which builds a value dependence chainduring execution. When the speculative execution for a predicted valueis finished, the depth of the value dependence chain indexed in thecorresponding SID is stored to be used to influence future decisionswhether to speculate or not. If other types of history data H are storedalso, for use in the speculation decision, the dependency depthinformation may be stored together with such other history data.

FIG. 5 shows a schematic illustration of a SID structure 15 with vectorsSID0, SID1, SID2, SIDn, alongside the reorder buffer (ROB) 7. The ROB 7has an index called a sequence number SN. The sequence number is anidentity code, which enables the processor to keep track of thedifferent parts of the instruction during execution. The SID structure15 is illustrated alongside the ROB 7 since the SID and the ROB aresubject to the same stepping orders.

However, when implemented the SID 15 does not have to be placed close tothe ROB 7, but can instead e.g. be placed close to the reservationstations 3.

When the prediction P is output from the VPU 9, the prediction isassigned a SID number corresponding to a SID vector. When the predictionP is stored in the reorder buffer 7, a hit is set in the assigned SIDvector which correspond to the sequence number SN of the loadinstruction that was predicted. Thus the hit that is set is uniquelydefined by the SID number and the sequence number SN. When theprediction P is used to execute a subsequent instruction the result isstored in the ROB 7 and assigned another sequence number SN. Thesequence number of the load instruction is then called a source sequencenumber and the sequence number of the result of the subsequentinstruction is called a destination sequence number. If the sourcesequence number has a hit set in a SID vector the hit in the SID vectorthat correspond to the destination sequence number is also set toindicate a speculative data dependence. When the prediction P isverified as correct or incorrect, the speculative state is to be clearedand the SID vector that correspond to the verified prediction is clearedso that it can be used for other predictions. Before clearing the SIDvector the number of hits that are set in the vector are sent to the VPU9. This number is the dependency depth D of the speculated loadinstruction and it is stored in the VPU 9 to form a basis for dependencydepth predictions for future decisions whether to speculate or not withrespect to the load instruction. The SID vectors may each be associatedwith a compute logic for computing the dependency depth. Each SID vectormay have a depth register 20 and an adder 21, which increments the depthregister 20 for each new SID vector hit assigned.

A simple example will now be described in connection with FIGS. 6 a–e inorder to further illustrate how the dependency depth of loadinstructions can be registered and stored to be used in futurespeculation decisions as mentioned above. In this simplified example weassume that we have an out-of-order processor 1 with a reorder buffer 7of five entries and with support for two value speculations in flightsimultaneously. FIGS. 6 a–e illustrate the SID vectors SID0 and SID 1 ofthe processor alongside a list of sequence numbers SN0–SN4 thatcorrespond to the five entries in the reorder buffer. A pseudo assemblycode 16 for the example processor is also illustrated in the FIGS. 6a–e. An arrow 17 shows the point of execution in the code throughout thefigures.

FIG. 6 a illustrates that a load instruction has been executedspeculatively and a value reg1 has been predicted. The hit that is setin the SID vector SID0 indicates the speculative execution of the loadinstruction.

FIG. 6 b illustrates that a subsequent add instruction has beenexecuted, where the predicted value reg1 was used. For this purpose, thevalue reg1 was retrieved from the reorder buffer together with itssequence number SN0. An indication of a speculation dependency was foundin vector SID0 for sequence number SN0, which means that the speculationdependency exist also for the result of the add instruction reg2associated with sequence number SN1. Thus a hit is set in vector SID0 inthe position that correspond to sequence number SN1.

The next instruction that is executed is another load instruction for avalue reg4 as shown in FIG. 6 c. This load instruction is also subjectto value prediction. The SID vector SID1 is used to keep track of thedependency of the speculative value reg4 as shown in the figure.

The next multiplication instruction depends on the speculative valuereg4. This is detected when the speculative value reg4 is delivered fromthe reorder buffer together with the sequence number SN2. A hit is setfor sequence number SN2 in vector SID1 and hence a hit is set in SID1for sequence number SN3 also as illustrated in FIG. 6 d. The result reg5of the multiplication instruction is stored in the reorder buffer andassociated with sequence number SN3.

The last instruction that is illustrated in this example is a subtractinstruction. FIG. 6 e shows the status of the SID vectors after thisinstruction has been executed. The subtract instructions depends on twovalues, reg5 and reg2, which depend on both of the earlier predictions.When the value bound with reg5 is delivered from the ROB to areservation station along with sequence number SN3, the speculative hitset for SN3 in the vector SID1 is detected and the SN4 entry in thevector SID1 is set. The same applies for the value reg2 associated withsequence number SN1. The speculative hit for SN1 that is set in vectorSID0 is detected in the reservation station and the hit in SID0 thatcorrespond to sequence number SN4 is set.

When the true value for the reg1 load instruction is received from thememory system, the VPU can order the vector SID0 to be cleared if theprediction turns out to be correct. According to this example eachvector SID0 and SID1 is associated with a depth register that isincremented during the speculative execution. The dependency registerassociated with the vector SID0 will when the true value for reg1 isdelivered contain a number, which is considered as the dependency depthfor this prediction. The dependency depth is delivered into the valuecache 10 when the vector SID0 is cleared. The dependency depth is storedand used to produce the dependency depth prediction, which is used lateras a factor in the future decisions whether to speculate or not.

In the case of a correct prediction for a load instruction the result ofthe load instruction can be discarded from the reorder buffer andwritten to the register file. The instructions that are marked asspeculation dependent will be cleared as speculative as the associatedSID vector is released (i.e. cleared of hits). The released SID vectoris free to be used for subsequent value speculations.

If the prediction was incorrect the reorder buffer and the associatedSID vector/s are flushed. The dependency depth for a mis-speculatedvalue could be reported to the value cache 10, but it could bemisleading if there where value dependent control flow instructions(branches) in the execution path.

The embodiments of the present invention that use dependency depthprediction as a factor in the speculation decision are not dependent onany particular processor architecture, but adapt well to many differenttypes of processors.

An embodimentt that uses current information on the dependency depth ofthe load instruction to be executed will be explained now. Compared tothe dependency prediction schemes based on dependency chain history, thescheme based on the current dependency depth is forced to extract itsinformation earlier in the execution pipe in order to decide whether ornot to use value speculation at the time of instruction issue. It is notpossible to derive the current dependency depth in all types ofprocessors. However, it is possible in a processor with a decoupledinstruction fetch unit, for instance a trace processor, which will openthe microarchitecture to dependency depth calculation beforehand. Iftraces are decoded beforehand the dependency depth of each loadinstruction execution could be calculated. As the trace construction isdone off the critical path the calculation of a dependency depth of aload instruction execution can be done as a trace pre-processing step ifdependency depth extraction logic is added to the trace processor. Atrace processor in which this scheme may be implemented is illustratedin “Processor Architectures”, Ch. 5.2, page 228, J. Silc, B. Robic, T.Ungerer, Springer Verlag, ISBN 3-540-64798-8. This trace processorincludes a fill unit, which is where the trace is constructed.Augmenting this unit with dependency depth extraction logic will enableeach delivered load instruction to carry with it the number ofinstructions dependent on the load value (in the trace). Thereby, when atrace and its load instructions are delivered to the issue step, eachload instruction's dependency depth is also delivered. The actualdependency depth of a load instruction is thus delivered together withthe load instruction to the issue step if the trace is executed asconstructed.

The embodiments described above illustrated that more qualifiedspeculation decisions in respect of value prediction can be obtained bymeans of taking the estimated gain of correct prediction intoconsideration. An estimation of the gain of correct prediction is cachehit/miss information while another estimation is the dependency depth.Many different schemes tat use the estimated gain of prediction as afactor in the speculation decision are possible and several exampleshave been mentioned above. Another example scheme takes both of the twotypes of estimations mentioned above, cache hit/miss information anddependency depth, into consideration when the decision whether tospeculate or not is made.

The different embodiments focus on different indicators; number ofinstructions dependent on a load instruction, and cache hit or cachemiss. The above-mentioned prior art methods for selective valuespeculation based on instruction dependency predictions catch dynamicbehavior past and are rater complex. However, these prior art methodssuffer from the interaction of cache fetch actions. A long dynamicdependency chain might not be long the next time around. The contents ofa cache might be different from time to time.

The embodiments are rater simple, but may still be very reliable. Manyadditional advantages may be derived from combinations of the “basic”embodiments. If for instance the cache hit scheme is combined wit thedependency depth prediction scheme, it will only be decided to baseexecution on a value prediction when the load latency is long and theinstruction window contains a large number of instructions dependent onthe value to be loaded. The combination will add dynamics to thedependency depth prediction scheme and static use-information to thecache hit scheme. It will also use actual memory latency information,not just predicted.

Thus the “basic” embodiments may be sorted into different classes. Oneway of classifying the embodiments is according to which part of theprocessor the information used in the speculation decision originatesfrom. There are schemes that use information from the memory system andschemes that use information from the execution engine. Informationrelating to cache hit or miss signals is information originating fromthe memory system and information regarding dependency depth isinformation from the execution engine. Another way of classifying theschemes is according to the point in time when the information to beused is collected. The embodiments above that use predictions based onhistorical data use information from the past while an unpredicted cachehit or miss signal or a current dependency depth is from the present.Each class of schemes has its strengths and weaknesses. By creatingschemes that are combinations of different classes the strengthassociated with one class may be used to counter-act the weaknesses ofother classes.

The major advantage of the present technology is that value predictionin situations where it might seem unwise due to the risk involved can beavoided. The present technology makes is possible to make and informedspeculation decision by means of basing the decision not only on thesuccess rate of previous predictions, but also on the estimated gainfrom a correct prediction. Avoiding value prediction when the estimatedgain of correct prediction is small compared to the risk involvedimproves the performance of the processor.

1. A processing unit for executing instructions in a computer system,which processing unit includes a value prediction unit for producingvalue predictions of values associated with instructions, which valueprediction unit includes decision logic for deciding whether or not avalue prediction for a first value is to be output for use in anexecution unit, wherein the decision logic is arranged to base itsdecision on information associated with the estimated time gain ofexecution based on a correct value prediction, wherein the informationon which the decision is based includes data cache hit/miss informationthat provides an indication of the likelihood that the first valueand/or the address of the first value is located in a data cache;wherein the information of which the decision is based further includesa dependency depth prediction, which is a prediction of the number ofinstructions that depend on the first value during speculative executionbased on a prediction of the first value; and wherein the processingunit further includes a data structure for storing informationindicative of the dependency depth during execution based on the valueprediction of the first value and means for storing the dependency depthin the value prediction unit for use as a dependency depth predictionfor subsequent value predictions of the first value.
 2. The processingunit according to claim 1, wherein the data cache hit/miss informationrelates to at least one actual data cache hit signal, such that the datacache hit/miss information with certainty indicates whether or not thefirst value and/or the address of the first value is located in a datacache.
 3. The processing unit according to claim 2, wherein the decisionlogic is prevented from outputting the value prediction for use in theexecution unit when the data cache hit/miss information indicates thatthe first value is located in a data cache.
 4. The processing unitaccording to claim 2, wherein the decision logic is prevented fromoutputting the value prediction for use in the execution unit when thedata cache hit/miss information indicates that the first value and theaddress of the first value each are located in a data cache.
 5. Theprocessing unit according to claim 1, wherein the information on whichthe decision is based includes a data cache hit prediction, which is aprediction of whether or not the first value is located in a data cache.6. The processing unit according to claim 5, wherein the decision logicis prevented from outputting the value prediction for use in theexecution unit when the data cache hit prediction predicts that thefirst value is located in the data cache.
 7. The processing unitaccording to claim 1, the information on which the decision is basedfurther includes dependency depth information, which is information onthe number of instructions that depend on the first value duringspeculative execution based on a prediction of the first value.
 8. Theprocessing unit according to claim 7, wherein the decision logic isprevented from outputting the value prediction for use in the executionunit when the dependency depth information is within a predeterminedrange.
 9. The processing unit according to claim 1, wherein the decisionlogic is prevented from outputting the value prediction for use in theexecution unit when the dependency depth prediction is within apredetermined range.
 10. The processing unit according to claim 1,wherein the information on which the decision is based further includesinformation on the success rate of previous value predictions of thefirst value.
 11. The processing unit according to claim 10, wherein theinformation on the success rate of previous value predictions of thefirst value includes a counter value and in that the decision logic isprevented from outputting the value prediction for use in the executionunit when the counter value is within a predetermined range.
 12. Theprocessing unit according to claim 1, wherein the decision logic isarranged to assign different weight factors to different parts of theinformation on which the decision is based and to combine the weightedparts of the information to form a decision value; and in that thedecision logic is prevented from outputting the value prediction for usein the execution unit when the decision value is within a predeterminedrange.
 13. A method in a processing unit for executing instructions in acomputer system, which method includes the step of producing a valueprediction for a first value associated with a first instruction and thestep of deciding whether or not to output the value prediction for usein an execution unit, which decision is based on information associatedwith the estimated time gain of execution based on a correct valueprediction, wherein the information on which the decision is basedincludes data cache hit/miss information that provides an indication ofthe likelihood that the first value and/or the address of the firstvalue is located in a data cache; wherein the information on which thedecision is based further includes a dependency depth prediction, whichis a prediction of the number of instructions that depend of the firstvalue during speculative execution based on a prediction of the firstvalue; and wherein the method further includes a step of storinginformation indicative of the dependency depth in a data structureduring execution based of the value prediction of the first value and astep of storing the dependency depth in a value prediction unit for useas a dependency depth prediction for subsequent value predictions of thefirst value.
 14. The method according to claim 13, wherein the datacache hit/miss information relates to at least one actual data cache hitsignal, such that the data cache hit/miss information with certaintyindicates whether or not the first value and/or the address of the firstvalue is located in a data cache.
 15. The method according to claim 14,further comprising deciding not to output the value prediction for usein the execution unit when the data cache hit/miss information indicatesthat the first value is located in a data cache.
 16. The methodaccording to claim 14, further comprising deciding not to output thevalue prediction for use in the execution unit when the data cachehit/miss information indicates that the first value and the address ofthe first value each are located in a data cache.
 17. The methodaccording to claim 13, wherein the information on which the decision isbased includes a data cache hit prediction (s4P), which is a predictionof whether or not the first value is located in a data cache.
 18. Themethod according to claim 17, further comprising deciding not to outputthe value prediction for use in the execution unit when the data cachehit prediction (s4P) predicts that the first value is located in thedata cache.
 19. The method according to claim 13, wherein theinformation on which the decision is based further includes dependencydepth information, which is information on the number of instructionsthat depend on the first value during speculative execution based on aprediction of the first value.
 20. The method according to claim 19,further comprising deciding not to output the value prediction for usein the execution unit when the dependency depth information is within apredetermined range.
 21. The method according to claim 13, furthercomprising deciding not to output the value prediction for use in theexecution unit when the dependency depth prediction is within apredetermined range.
 22. The method according to claim 13, wherein theinformation on which the decision is based further includes informationon the success rate of previous value predictions of the first value.23. The method according to claim 22, further comprising the informationon the success rate of previous value predictions of the first valueincluding a counter value and by deciding not to output the valueprediction to the execution unit when the counter value is within apredetermined range.
 24. The method according to claim 13, furthercomprising assigning different weight factors to different parts of theinformation on which the decision is based and combining the weightedparts of the information to form a decision value; and by deciding notto output the value prediction to the execution unit when the decisionvalue is within a predetermined range.